Watermark time scale searching

ABSTRACT

Method and apparatus are described for compensating for a linear time scale change in a received signal, so as to correctly rescale the frame sequence of the received signal. Firstly, an initial estimate of the sequence of symbols is extracted from the received signal. Successive estimates of correctly time scaled sequences of the symbols are then generated by interpolating the values of the initial estimates.

The present invention relates to apparatus and methods for decodinginformation that has been embedded in information signals, such asaudio, video or data signals.

Watermarking of information signals is a technique for the transmissionof additional data along with the information signal. For instance,watermarking techniques can be used to embed copyright and copy controlinformation into audio signals.

The main requirement of a watermarking scheme is that it is notobservable (i.e. in the case of an audio signal, it is inaudible) whilstbeing robust to attacks to remove the watermark from the signal (e.g.removing the watermark will damage the signal). It will be appreciatedthat the robustness of a watermark will normally be a trade off againstthe quality of the signal in which the watermark is embedded. Forinstance, if a watermark is strongly embedded into an audio signal (andis thus difficult to remove) then it is likely that the quality of theaudio signal will be reduced.

In digital devices, it is typically assumed that there exists up to a 1%drift in sampling (clock) frequency. During transmission of the signalthrough an analog channel, this drift is normally manifested as astretch or shrink in the time domain signal (i.e. a linear time scalechange). A watermark embedded in the time domain (e.g. in an audiosignal) will be affected by this time stretch or shrink as well, whichcan make watermark detection very difficult or even impossible. Thus, inthe implementation of a robust watermarking scheme, it is extremelyimportant to find solutions to such time scale modifications.

In known time domain watermarking schemes, any linear time scale changewithin the signal is resolved by repeatedly running the watermarkdetection (including repeating the extraction of the watermark from thehost signal) for the different possible time scales, until all thepossible time scales are exhausted, or detection is achieved. Performingsuch searches over the possible time scaling ranges requires a largecomputational overhead, and is thus costly in terms of both hardware andcomputational time. Consequently, real time implementation of awatermark detector utilizing such a time scale search technique is notfeasible.

In watermarking schemes implemented within the frequency domains, it iscommon to perform the scale search by modifying the frequency domaincoefficients. For instance, this can be achieved by carefully shrinkingor stretching the frequency domain samples. In principle, such afrequency domain solution could be directly applied to time domainwatermark signals. However, since the watermarks are directly embeddedin the time domain samples, the time scale search needs to be performedin the time domain as well. Normally, there are only a few thousandfrequency domain samples, whilst the time domain signals contain samplesin the order of millions. Consequently, such an application of thefrequency domain solution to time domain signals is computationally tooexpensive.

It is an object of the present invention to provide a watermark decodingscheme for time domain watermarked signals that utilizes a time scalesearch that substantially addresses at least one of the problems of theprior art.

In a first aspect, the present invention provides a method ofcompensating for a linear time scale change in a received signal, thesignal being modified by a sequence of symbols in the time domain, themethod comprising the steps of: (a) extracting an initial estimate ofthe sequence of symbols from said received signal; (b) forming anestimate of a correctly time scaled sequence of the symbols byinterpolating the values of said initial estimate.

Preferably, step (b) is repeated so as to provide a range of estimatescorresponding to different time scalings.

Preferably, said interpolation is at least one of zeroth orderinterpolation, linear interpolation, quadratic interpolation and cubicinterpolation.

Preferably, the method further comprises the step of processing eachestimate as though it were the correctly time scaled sequence of thesymbols, so as to determine which estimate is the best estimate.

Preferably, the method further comprises the steps of correlating eachof said estimates with a reference corresponding to said sequence ofsymbols; and taking the estimate with the maximum correlation peak asthe best estimate.

Preferably, said initial estimate of the sequence of symbols is storedin a buffer.

Preferably, said buffer is of total length M, the total number of scalesearches conducted is$N_{\eta} = {\frac{M}{2}( {\eta_{\max} - \eta_{\min}} )}$where η_(min), η_(max) correspond respectively to the minimum andmaximum likely time scale modifications of the signal.

Preferably, said initial estimates of the sequence of symbols comprisesa sequence of N_(b) estimates for each symbol, each of the N_(b)estimates corresponding to a different time offset of a symbol.

Preferably, the scale search in the next detection window is adaptedbased on the information acquired during the current detection window.

Preferably, the scale space is searched using an optimal searchingalgorithm.

Preferably, the searching algorithm is the grid refinement algorithm.

In another aspect, the present invention provides a computer programarranged to perform the method as described above.

In further aspects, the present invention provides a record carriercomprising the computer program, and a method of making available fordownloading the computer program.

In another aspect, the present invention provides an apparatus arrangedto compensate for a linear time scale change in a received signal, thesignal being modified by a sequence of symbols in the time domain, theapparatus comprising: an extractor arranged to extract an initialestimate of the sequence of symbols from said received signal; and aninterpolator arranged to form an estimate of a correctly time scaledsequence of the symbols by interpolating the values of said initialestimate.

Preferably, the apparatus further comprises a buffer arranged to storeone or more of said estimates.

In another aspect, the present invention provides a decoder comprisingthe apparatus as described above.

For a better understanding of the invention, and to show how embodimentsof the same may be carried into effect, reference will now be made, byway of example, to the accompanying diagrammatic drawings in which:

FIG. 1 is a diagram illustrating a watermark embedding apparatus;

FIG. 2 shows a signal portion extraction filter H;

FIGS. 3 a and 3 b show respectively the typical amplitude and phaseresponses as a function of frequency of the filter H shown in FIG. 2;

FIG. 4 shows the payload embedding and watermark conditioning stage ofthe apparatus shown in FIG. 1;

FIG. 5 is a diagram illustrating the details of the watermarkconditioning apparatus H_(c) of FIG. 4, including charts of theassociated signals at each stage;

FIGS. 6 a and 6 b show two preferred alternative window shapingfunctions s(n) in the form of respectively a raised cosine function anda bi-phase function;

FIGS. 7 a and 7 b show respectively the frequency spectra for awatermark sequence conditioned with a raised cosine and a bi-phaseshaping window function;

FIG. 8 is a diagram illustrating a watermark detector in accordance withan embodiment of the present invention;

FIG. 9 diagrammatically shows the whitening filter H_(w) of FIG. 8, foruse in conjunction with a raised cosine shaping window function;

FIG. 10 diagrammatically shows the whitening filter H_(w) of FIG. 8, foruse in conjunction with a bi-phase window shaping function;

FIG. 11 shows details of the watermark symbol extraction and bufferingprocesses in accordance with an embodiment of the present invention;

FIG. 12 illustrates a sequence in which estimates of watermark symbolsare collected from four buffers when there is no time scalemodification;

FIGS. 13 a and 13 b illustrate the different sequences, according to anembodiment of the present invention, in which estimates of watermarksymbols can be collected from four buffers when there is respectively atime stretch and a time shrink time scale modification;

FIG. 14 shows an example of an efficient scale search technique based onthe concept of grid refinement; and

FIG. 15 shows a typical shape of the correlation function output fromthe correlator of the watermark detector shown in FIG. 8.

FIG. 1 shows a block diagram of the apparatus required to perform thedigital signal processing for embedding a multi-bit payload watermark winto a host signal x.

A host signal x is provided at an input 12 of the apparatus. The hostsignal x is passed in the direction of output 14 via the adder 22.However, a replica of the host signal x (input 8) is split off in thedirection of the multiplier 18, for carrying the watermark information.

The watermark signal w_(c) is obtained from the payload embedder andwatermark conditioning apparatus 6, and derived from a reference finitelength random sequence w_(s) input to the payload embedder and watermarkconditioning apparatus. The multiplier 18 is utilized to calculate theproduct of the watermark signal w_(c) and the replica audio signal x.The resulting product, w_(x) is then passed via a gain controller 24 tothe adder 22. The gain controller 24 is used to amplify or attenuate thesignal by a gain factor α.

The gain factor α controls the trade off between the audibility and therobustness of the watermark. It may be a constant, or variable in atleast one of time, frequency and space. The apparatus in FIG. 1 showsthat, when α is variable, it can be automatically adapted via a signalanalyzing unit 26 based upon the properties of the host signal x.Preferably, the gain α is automatically adapted, so as to minimize theimpact on the signal quality, according to a properly chosenperceptibility cost-function, such as a psycho-acoustic model of thehuman auditory system (HAS) in case of an audio signal. Such a model is,for instance, described in the paper by E.Zwicker, “Audio Engineeringand Psychoacoustics: Matching signals to the final receiver, the HumanAuditory System”, Journal of the Audio Engineering Society, Vol. 39, pp.Vol. 115-126, March 1991.

In the following, an audio watermark is utilized, by way of exampleonly, to describe this embodiment of the present invention.

The resulting watermark audio signal y is then obtained at the output 14of the embedding apparatus 10 by adding an appropriately scaled versionof the product of w_(c) and x to the host signal:y[n]=x[n]+αw _(c) [n]x[n].  (1)

Preferably, the watermark w_(c) is chosen such that when multiplied withx, it predominantly modifies the short time envelope of x.

FIG. 2 shows one preferred embodiment in which the input 8 to themultiplier 18 in FIG. 1 is obtained by filtering a replica of the hostsignal x using a filter H in the filtering unit 15. If the filter outputis denoted by x_(b), then according to this preferred embodiment, thewatermark signal is generated by adding the product of x_(b) and thewatermark w_(c) to the host signal x:y[n]=x+αw _(c) [n]x _(b) [n].  (2)

Let {overscore (x)}_(b) be defined such that {overscore(x)}_(b)=x−x_(b), and y_(b) be defined such that y=y_(b)+{overscore(x)}_(b), then the envelope modulated portion Yb of the watermarkedsignal y is given as y_(b)[n]=(1+w_(c)[n])x_(b)[n]  (3)

Preferably, as shown in FIG. 3, the filter H is a linear phase band passfilter characterized by its lower cut-off frequency f_(L) and uppercut-off frequency f_(H). As can be seen in FIG. 3 b, the filter H has alinear phase response with respect to frequency f within the pass-band(BW). Thus, when H is a band pass filter, x_(b) and {overscore (x)}_(b)are the in-band and out-of-band components of the host signalrespectively. For optimum performance, it is preferable that the signalsx_(b) and {overscore (x)}_(b) are in phase. This is achieved byappropriately compensating for the phase distortion produced by filterH. In the case of a linear phase filter, the distortion is a simple timedelay.

In FIG. 4, the details of the payload embedder and watermarkconditioning unit 6 is shown. In this unit, the initial reference randomsequence w, is converted into a multi-bit watermark signal w_(c).

Firstly a finite length, preferably zero mean and uniformly distributedrandom sequence w_(s), from now on also referred to as the watermarkseed signal, is generated using a random number generator with aninitial seed S. As will be appreciated later, it is preferable that thisinitial seed S is known to both the embedder and the detector, such thata copy of the watermark signal can be generated at the detector forcomparison purposes. This results in the sequence of length L_(w).w _(s) [k]ε[−1,1], for k=0,1,2, . . . , L _(w)−1  (4)

It should be noted that in some applications, the seed can betransmitted to the detector via an alternate channel or can be derivedfrom the received signal using some pre-determined protocol.

Then the sequence w_(s) is circularly shifted by the amounts d₁ and d₂using the circularly shifting unit 30 to obtain the random sequencesw_(d1) and w_(d2) respectively. It will be appreciated that these twosequences (w_(d1) and w_(d2)) are effectively a first sequence and asecond sequence, with the second sequence being circularly shifted withrespect to the first. Each sequence w_(d1), i=1,2, is subsequentlymultiplied with a respective sign bit r_(i), in the multiplying unit 40,where r_(i)=+1 or −1. The respective values of r₁ and r₂ remainconstant, and only change when the payload of the watermark is changed.Each sequence is then converted into a periodic, slowly varyingnarrow-band signal w_(i) of length L_(w)T_(s) by the watermarkconditioning circuit 20 shown in FIG. 4. Finally, the slowly varyingnarrow-band signals w₁ and w₂ are added with a relative delay T_(r)(where T_(r)<T_(s)) to give the multi-bit payload watermark signalw_(c). This is achieved by first delaying the signal w₂ by the amountT_(r) using delaying unit 45 and subsequently by adding it to w₁ withthe adding unit 50.

FIG. 5 shows the watermark conditioning apparatus 20 used in the payloadembedder and watermark conditioning apparatus 6 in more detail. Thewatermark seed signal w_(s) is input to the conditioning apparatus 20.

For convenience, the modification of only one of the sequences w_(di) isshown in FIG. 5, but it will be appreciated that each of the sequencesis modified in a similar manner, with the results being added to obtainthe watermark signal w_(c).

As shown in FIG. 5, each watermark signal sequence w_(di)[k], i=1,2 isapplied to the input of a sample repeater 180. Chart 181 illustrates oneof the sequences w_(di) as a sequence of values of random numbersbetween +1 and −1, with the sequence being of length L_(w). The samplerepeater repeats each value within the watermark seed signal sequenceT_(s) times, so as to generate a rectangular pulse train signal. T_(s)is referred to as the watermark symbol period and represents the span ofthe watermark symbol in the audio signal. Chart 183 shows the results ofthe signal illustrated in chart 181 once it has passed through thesample repeater 180.

A window shaping function s[n], such as a raised cosine window, is thenapplied to convert the rectangular pulse functions derived from w_(d1)and w_(d2) into slowly varying watermark sequence functions w₁[n] andw₂[n] respectively.

Chart 184 shows a typical raised cosine window shaping function, whichis also of span T_(s).

The generated watermark sequences w₁[n] and w₂[n] are then added up witha relative delay T_(r) (where T_(r)<T_(s)) to give the multi-bit payloadwatermark signal w_(c)[n] i.e.,w _(c) [n]=w ₁ [n]+w ₂ [n−T _(r)]  (5)

The value of T_(r) is chosen such that the zero crossings of w₁ matchthe maximum amplitude points of w₂ and vice-versa. Thus, for a raisedcosine window shaping function T_(r)=T_(s)/2, and for a bi-phase windowshaping function T_(r)=T_(s)/4. For other window shaping functions,other values of T_(r) are possible.

As will be appreciated by the below description, during detection thecorrelation of w_(c)[n] will generate two correlation peaks that areseparated by pL′ (as can be seen in FIG. 15). pL′ is an estimate of thecircular shift pL between w_(d1) and w_(d2), which is part of thepayload, and is defined aspL=|d ₂ −d ₁|mod(┌L_(w)/2┐)  (6)

In addition to pL, extra information can be encoded by changing therelative signs of the embedded watermarks.

In the detector, this is seen as a relative sign r_(sign) between thecorrelation peaks. It may be defined as: $\begin{matrix}{r_{sign} = {\frac{{2 \cdot \rho_{1}} + \rho_{2} + 3}{2} \in \{ {0,1,2,3} \}}} & (7)\end{matrix}$where ρ₁=sign(cL₁) and ρ₂=sign(cL₂) are respectively estimates of thesign bits r₁ (input 80) and r₂ (input 90) of FIG. 4, and cL₁ and cL₂ arethe values of the correlation peak corresponding to w_(d1) and w_(d2)respectively. The overall watermark payload pL_(w), for an error-freedetection, is then given as a combination of r_(sign) and pL:pL_(w)=<r_(sign), pL>.  (8)

The maximum information (I_(max)), in number of bits, that can becarried by a watermark sequence of length L_(w) is thus given by:I _(max)=log₂(4·┌L _(w)/2┐) bits  (9)

In such a scheme, the payload is immune to relative offset between theembedder and the detector, and also to possible time scalemodifications.

The window shaping function has been identified as one of the mainparameters that controls the robustness and audibility behavior of thepresent watermarking scheme. As illustrated in FIGS. 6 a and b, twoexamples of possible window shaping functions are herein described—araised cosine function and a bi-phase function.

It is preferable to use a bi-phase window function instead of a raisedcosine window function, so as to obtain a quasi DC-free watermarksignal. This is illustrated in FIGS. 7 a and 7 b, showing the frequencyspectra corresponding to a watermark sequence (in this case a sequenceof w_(di)[k]={1,1,−1,1,−1,−1,}) conditioned with respectively a raisedcosine and a bi-phase window shaping function. As can be seen, thefrequency spectrum for the raised cosine conditioned watermark sequencehas a maximum at frequency f=0, whilst the frequency spectrum for thebi-phase shaped watermark sequence has a minimum at f=0 i.e. it has verylittle DC component.

Useful information is only contained in the non-DC component of thewatermark. Consequently, for the same added watermark energy, awatermark conditioned with the bi-phase window will carry more usefulinformation than one conditioned by the raised cosine window. As aresult, the bi-phase window offers superior audibility performance forthe same robustness or, conversely, it allows a better robustness forthe same audibility quality.

Such a bi-phase function could be utilized as a window shaping functionfor other watermarking schemes. In other words, a bi-phase functioncould be applied to reduce the DC component of signals (such as awatermark) that are to be incorporated into another signal.

FIG. 8 shows a block diagram of a watermark detector (200, 300, 400).The detector consists of three major stages: (a) the watermark symbolextraction stage (200), (b) the buffering and interpolation stage (300),and (c) the correlation and decision stage (400).

In the symbol extraction stage (200), the received watermarked signaly′[n] is processed to generate multiple (N_(b)) estimates of thewatermarked sequence. These estimates of the watermark sequence arerequired to resolve time offset that may exist between the embedder andthe detector, so that the watermark detector can synchronize to thewatermark sequence inserted in the host signal.

In the buffering and interpolation stage (300), these estimates arede-multiplexed into N_(b) separate buffers, and an interpolation isapplied to each buffer to resolve time scale modifications that may haveoccurred, e.g. a drift in sampling (clock) frequency may have resultedin a stretch or shrink in the time domain signal (i.e. the watermark mayhave been stretched or shrunk).

In the correlation and decision stage (400), the content of each bufferis correlated with the reference watermark and the maximum correlationpeaks are compared against a threshold to determine the likelihood ofwhether the watermark is indeed embedded within the received signaly′[n].

In order to maximize the accuracy of the watermark detection, thewatermark detection process is typically carried out over a length ofreceived signal y′[n] that is 3 to 4 times that of the watermarksequence length. Thus each watermark symbol to be detected can beconstructed by taking the average of several estimates of said symbol.This averaging process is referred to as smoothing, and the number oftimes the averaging is done is referred to as the smoothing factors_(f): Let L_(D) be the detection window length, defined as the lengthof the audio segment (in number of samples) over which a watermarkdetection truth-value is reported. Then, L_(D)=s_(f)L_(w)T_(s), whereT_(s) is the symbol period and L_(w) the number of symbols within thewatermark sequence. During symbol extraction, a factor T_(s) decimationtakes place in the energy computation stage. Thus, the length (L_(b)) ofeach buffer 320 within the buffering and interpolation stage isL_(b)=s_(f)L_(w).

In the watermark symbol extraction stage 200 shown in FIG. 8, theincoming watermark signal y′[n] is input to the optional signalconditioning filter H_(b)(210). This filter 210 is typically a band passfilter and has the same behavior as the corresponding filter (H, 15)shown in FIG. 2. The output of the filter H_(b) is y′_(b)[n] and,assuming linearity within the transmission medium, it follows fromequations (1) and (3):y′ _(b) [n]≈y _(b) [n]=(1+αw[n])x _(b) [n]  (10)

Note that in the above expression, the possible time offset between theembedder and the detector is implicitly ignored. For ease of explanationof the general watermarking scheme principles, from now on, it isassumed that there is perfect synchronism between the embedder and thedetector (i.e. no offset). Explanation is given however below inreference to FIG. 11 of how to compensate for time offset in accordancewith the present invention.

Note that when no filter is used in the embedder (i.e., when H=1) thenH_(b) in the detector can also be omitted, or it can still be includedto improve the detection performance. If H_(b) is omitted, then y_(b) inequation (10) is replaced with y. The rest of the processing is thesame.

We assume that the audio signal is divided into frames of length T_(s),and that y′_(b,m)[n] is the n-th sample of the m-th filtered framesignal. The energy E[m] corresponding to the m-th frame is thus:$\begin{matrix}{{E\lbrack m\rbrack} = {\sum\limits_{n = 0}^{T_{s} - 1}{{y_{b,m}^{\prime}\lbrack n\rbrack}}^{2}}} & (11)\end{matrix}$

Combining this with equation 10, it follows that: $\begin{matrix}{{{E\lbrack m\rbrack} \approx {\sum\limits_{n = 0}^{T_{s} - 1}{{y_{b,m}\lbrack n\rbrack}}^{2}}} = {\sum\limits_{n = 0}^{T_{s} - 1}{{( {1 + {\alpha\quad{w_{e}\lbrack m\rbrack}}} ){x_{b,m}\lbrack n\rbrack}}}^{2}}} & (12)\end{matrix}$where w_(e)[m] is the m-th extracted watermark symbol and contains N_(b)time-multiplexed estimates of the embedded watermark sequences. Solvingfor w_(e)[m] in equation 12 and ignoring higher order terms of α, givesthe following approximation: $\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\quad\alpha}( {\frac{\sum\limits_{n = 0}^{T_{s} - 1}{{y_{b,m}\lbrack n\rbrack}}^{2}}{\sum\limits_{n = 0}^{T_{s} - 1}{{x_{b,m}\lbrack n\rbrack}}^{2}} - 1} )}} & (13)\end{matrix}$

In the watermark extraction stage 200 shown in FIG. 8, the outputy′_(b)[n] of the filter H_(b) is provided as an input to a frame divider220, which divides the audio signal into frames of length T_(s) i.e.into y′_(b,m)[fn], with the energy calculating unit 230 then being usedto calculate the energy corresponding to each of the framed signals asper equation (12). The output of this energy calculation unit 230 isthen provided as an input to the whitening stage H_(w) (240) whichperforms the function shown in equation 13 so as to provide an outputw_(e)[m]. Alternative implementations (240A, 240B) of this whiteningstage are illustrated in FIGS. 9 and 10.

It will be realized that the denominator of equation 13 contains a termthat requires knowledge of the host signal x. As the signal x is notavailable to the detector, it means that in order to calculate w_(e)[m]then the denominator of equation 13 must be estimated.

Below is described how such an estimation can be achieved for the twodescribed window shaping functions (the raised cosine window shapingfunction and the bi-phase window shaping function), but it will equallybe appreciated that the teaching could be extended to other windowshaping functions.

In relation to the raised cosine window shaping function shown in FIG.6(a), it has been realized that the audio envelope induced by thewatermark contributes only to the noisy part of the energy functionE[m]. The slowly varying part (i.e. the low frequency component) ispredominately due to the contribution of the envelope of the originalaudio signal x. Thus, equation 13 may be approximated by:$\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\quad\alpha}( {\frac{E\lbrack m\rbrack}{{lowpass}( {E\lbrack m\rbrack} )} - 1} )}} & (14)\end{matrix}$where “lowpass(.)” is a low pass filter function. Thus, it will beappreciated that the whitening filter H_(w) for the raised cosine windowshape in the function can be realized as shown in FIG. 9.

As can be seen, such a whitening filter H_(w) (240A) comprises an input242A for receiving the signal E[m]. A portion of this signal is thenpassed through the low pass filter 247A to produce a low pass filteredenergy signal E_(LP)[m], which in turn is provided as an input to thecalculation stage 248A along with the function E[m]. The calculationstage 248A then divides E[m] by E_(LP)[m] to calculate the extractedwatermark symbol w_(e)[m].

When a bi-phase window function is employed in the watermarkconditioning stage of the embedder, a different approach should beutilized to estimate the envelope of the original audio, and hence tocalculate w_(e)[m].

It will be seen by examination of the bi-phase window function shown inFIG. 6 b, that when the audio envelope is modulated with such a windowfunction, the first and the second halves of the frame are scaled inopposite directions. In the detector, this property is utilized toestimate the envelope energy of the host signal x.

Consequently, within the detector, each audio frame is first subdividedinto two halves. The energy functions corresponding to the first andsecond half-frames are hence given by $\begin{matrix}{{E_{1}\lbrack m\rbrack} = {\sum\limits_{n = 0}^{{T_{s}/2} - 1}{{y_{b,m}^{\prime}\lbrack n\rbrack}}^{2}}} & (15) \\{and} & \quad \\{{E_{2}\lbrack m\rbrack} = {\sum\limits_{n = {T_{s}/2}}^{T_{s} - 1}{{y_{b,m}^{\prime}\lbrack n\rbrack}}^{2}}} & (16)\end{matrix}$respectively. As the envelope of the original audio is modulated inopposite directions within the two sub-frames, the original audioenvelope can be approximated as the mean of E₁[m] and E₂[m].

Further, the instantaneous modulation value can be taken as thedifference between these two functions. Thus, for the bi-phase windowfunction, the watermark w_(e)[m] can be approximated by: $\begin{matrix}{{w_{e}\lbrack m\rbrack} \approx {\frac{1}{2\quad\alpha}( {\frac{{E_{1}\lbrack m\rbrack} - {E_{2}\lbrack m\rbrack}}{{E_{1}\lbrack m\rbrack} + {E_{2}\lbrack m\rbrack}} - 1} )}} & (17)\end{matrix}$

Consequently, the whitening filter H_(w) (240B) in FIG. 8 for a bi-phasewindow shaping function can be realized as shown in FIG. 10. Inputs 242Band 243B respectively receive the energy functions of the first andsecond half frames E₁[m] and E₂[m]. Each energy function is then splitup into two, and provided to adders 245B and 246B which respectivelycalculate E₁[m]−E₂[m], and E₁[m]+E₂ μm]. Both of these calculatedfunctions are then passed to the calculating unit 248B which divides thevalue from adder 245B by the value from 246B so as to calculatew_(w)[m], containing N_(b) time-multiplexed estimates of the embeddedwatermark sequences, in accordance with equation 17.

This output w_(e)[m] is then passed to the buffering and interpolationstage 300 (FIG. 8), where the signal is de-multiplexed by ade-multiplexer 310, buffered in buffers 320 of length L_(b), so as toresolve a lack of synchronism between the embedder and the detector, andinterpolated within the interpolation unit 330 so as to compensate for atime scale modification between the embedder and the detector.

In order to maximize the possible robustness of a watermark, it isimportant to make sure that the watermarking system is immune to bothtime offsets and drifts in sampling frequency between the embedder andthe detector. In other words, the watermark detector must be able tosynchronize to the watermark sequence inserted in the host signal.

FIG. 11 illustrates the process carried out by the buffering andinterpolation stage 300 to resolve the offset issue. The exampledescribed illustrates the process for resolving offset when a raisedcosine window shaping function has been employed in the watermarkembedding process. However, in principle the same technique isapplicable when the bi-phase window shaping function has been used.

Referring to FIG. 11, after filtering by the filter H_(b) 210, theincoming audio signal streamy y′_(b)[n] is separated into preferablyoverlapping frames 302 of effective length T_(s) by the frame divider220.

Preferably, to resolve possible offset between the embedder and thedetector, each frame is divided into N_(b) sub-frames (304 a, 304 b, . .. ,304 x), and the above computations (equations (12) to (17)) areapplied on a sub-frame basis.

Preferably, each sub-frame overlaps with an adjacent sub-frame. In theexample shown, it can be seen that there is a 50% overlap (T_(s)/N_(b))of each sub-frame (304 a, 304 b, . . . , 304 x), with each of thesub-frames being of length 2T_(s)/N_(b). When overlapping sub-frames areconsidered, the main frames are preferably longer than the symbol periodT_(s) so as to allow inter-frame overlap as shown in FIG. 11.

The energy of the audio is then computed for each sub-frame by thewhitening stage 240, and the resulting values are de-multiplexed intothe N_(b) buffers 320 by the de-multiplexer 310. Each one (B₁, B₂, . . ., B_(Nb)) of the buffers 320 will thus contain a sequence of values,with the first buffer B₁ containing a sequence of values correspondingto the first sub-frame within each frame, the second buffer B₂containing a sequence of values corresponding to the second sub-framewithin each frame etc.

If w_(Di) is the content of the i-th buffer, then it can be shown that:w _(Di) [k]=w _(e) [k·N _(b) +i], kε{0, . . . , L _(b)−1}  (18)where L_(b) is the buffer length.

For a raised cosine window shaping function, the energy of the embeddedwatermark is concentrated near the center of the frame, such that thesub-frame best aligned with the center of the frame will result in adistinctly better estimate of the embedded watermark symbol than all theother sub-frames. Effectively, each buffer thus contains an estimate ofthe symbol sequence, the estimates corresponding to the sequences havingdifferent time offsets.

The sub-frame best aligned with the center of the frame (i.e. the bestestimate of the correctly aligned frame) is determined by correlatingthe contents of each buffer with the reference watermark sequence. Thesequence with the maximum correlation peak value is chosen as the bestestimate of the correctly aligned frame. The corresponding confidencelevel, as described below, is used to determine the truth-value of thedetection. Preferably, the correlation process is halted once anestimated watermark sequence with a correlation peak above the definedthreshold has been found.

Typically, the length of each buffer is between 3 to 4 times thewatermark sequence length L_(w), and is thus typically of length between2048 and 8192 symbols, and Nb is typically within the range of 2 to 8.

The buffer is normally 3 to 4 times that of the watermark sequence sothat each watermark symbol can be constructed by taking the averages ofseveral estimates of said symbol. This averaging process is referred toas smoothing, and the number of times the averaging is done is referredto as the smoothing factor s_(f). Thus, given the buffer length L_(b)and the watermark sequence length L_(w), the smoothing factor s_(f) issuch that:L _(b) =s _(f) L _(w)  (19)

In another preferred embodiment, the detector refines the parametersused in the offset search based upon the results of a previous searchstep. For instance, if a first series of estimates shows that theresults stored in buffer B₃ provide the best estimate of the informationsignal, then the next offset search (either on the same received signal,or on the signal received during the next detection window) is refinedby shifting the position of the sub-frames towards the position of thebest estimate sub-frame. The estimates of the sequence having zerooffset can thus be iteratively improved.

As previously mentioned, there can exist a drift in sampling (clock)frequency in digital devices, which results in a stretch or shrink inthe time domain signal.

For instance, consider an audio segment s of length L that is timescaled such that it's new length becomes L_(η)=L(1+η) where η is thetime scaling factor, with η being a constant such that 1+η>0; for a timestretch η>0, and for a time shrink η<0.

When the signal is not time scale modified (η=0), N_(b) estimates of thewatermark sequence are constructed by collecting the symbols stored inthe N_(b) buffers separately.

FIG. 12 illustrates four buffers (B1, B2, B3, B4), each buffer shown asa row of boxes, with each box within a row indicating a separatelocation within the respective buffer. The sequences w_(I1), w_(I2),w_(I3), w_(I4) are respective estimates of the watermark sequence. Inthe example shown in FIG. 12, it is assumed that the signal is not timescale modified, and hence each estimate (w_(I1), w_(I2), w_(I3), w_(I4))represents an estimate of the watermark sequence with different timeoffset.

Consequently, each estimate (that is passed to the correlator 410) isformed by sequentially collecting the entries from each buffer. Forexample, the first value in sequence w_(I1) (w_(I1) [1]) is collectedfrom the first location of B1, the second (w₁ [2]) from the secondlocation of B1 etc, with the final value (w_(I1) [L_(b)]) beingcollected from the final location of the buffer. It will be appreciatedthat the arrows, which connect each box in a row to the neighboring box,show the direction in which values of the sequence estimates arecollected from the buffer locations. It will also be appreciated that,whilst only eleven buffer locations are shown for each buffer, the sizeof the buffers in practice is likely to be significantly larger thanthis. For example, in the preferred embodiment, the length of eachbuffer is typically between 2048 and 8192 locations, with the number ofbuffers typically being between 2 and 8. However, in order to preventoverflow of buffers during time scale search, the actual buffer lengthsare set to (1+|η_(max)|) times the typical lengths specified above,where η_(max) is the expected maximum scaling factor.

When the received signal y′[n] has been time scale modified, it isnecessary to perform a time scale search in order to correctly estimatethe watermark sequence. In the present invention, such a search isperformed by systematically combining the extracted watermark sequenceestimates (w_(e)[m]), preferably by systematically combining(interpolating) the different estimates of the watermark sequencesstored in the buffers.

Such time scale searches can be performed by utilizing any order ofinterpolation. In the following two preferred embodiments, two orders ofinterpolation will be described—the first order (linear) interpolationand the zero order interpolation. However, it will be appreciated thatthis technique can be extended to higher orders of interpolation e.g.quadratic and cubic interpolation.

In the first embodiment, estimates of the time scaled watermark sequenceare provided by applying linear interpolation to the previouslyextracted estimates of the watermark sequence.

To this end, it can be assumed that the intermediate values w_(e)[k]generated by the symbol extraction step shown in FIG. 8 are sequentiallystored in a single buffer of length M in place of the N_(b) buffers. Inother words, that the N_(b) buffers are multiplexed into a single bufferof length M=N_(b)s_(f)L_(w), where L_(w) and s_(f) are as definedearlier. Let the so stretched sequence be represented by w_(D). It cannow be assumed that w_(D) represents discrete samples of an otherwisecontinuous function. During time scale modification, these discretepoints are either pushed towards each other or stretched out. This inturn is translated to re-sampling of the watermark function.

In this embodiment, re-sampling is realized via a linear interpolationtechnique. That is, given the watermark sequence w_(D)[m],m=1, . . . ,M,an interpolated watermark sequence w₁[m] is generated asw ₁ [m]=μw _(D)(└(1+η)m┘)+(1−μ)w _(D)(┌(1+η)m┐)  (20)

Where μ=┌(1+η)m┐−(1+η)m, and ┌·┐ and └·┘ are the floor and the ceilingoperators, respectively. After the interpolation, the watermarksequences are folded back into the N_(b) buffers in a similar way tothat shown in FIG. 11. Let the interpolated watermark sequence foldedinto the buffer bε{0, . . . ,N_(b)−1} be denoted by w_(1,b)[k], then itcan be shown thatw _(1,b) [k]=μw _(D)(└(N _(b) k+b)(1+η)┘)+(1−μ)w _(D)(└(N _(b)k+b)(1+η)┘).  (21)Let for b=1, . . . , N_(b), w_(D,b)[k] be the pre-interpolation sequencestored in the b-th buffer, and q_(pk) ε{1, . . . s_(f)L_(w)} and r_(pk)ε1, . . . . N_(b)} be defined as $\begin{matrix}{q_{bk} = \lfloor \frac{( \lfloor {( {{N_{b}k} + b} )( {1 + \eta} )} \rfloor )}{N_{b}} \rfloor} \\{and} \\{r_{bk} = {( \lfloor {( {{N_{b}k} + b} )( {1 + \eta} )} \rfloor ) - {N_{b}{\lfloor \frac{( \lfloor {( {{N_{b}k} + b} )( {1 + \eta} )} \rfloor )}{N_{b}} \rfloor.}}}}\end{matrix}$Then, it can be shown that w_(D)(└(N_(b)k+b)(1+η)┘)=w_(D,ηbk)[q_(bk)].Putting this into equation (21), it follows thatw _(1,b) [k]=μw _(D,ηbk) [q _(bk) [k]]+(1−μ)w _(D,(ηbk+1)) [q_(bk)+1]  (22)Thus, the interpolated buffer entries can be calculated directly fromthe N_(b) sequences w_(D,b), b=1, . . . ,N_(b) (as shown in FIG. 8,being passed to the correlator 410), by solving equation (22).

A further embodiment of the present invention will now be described, inwhich estimates of the time scaled watermark sequence are provided byapplying zero order interpolation to the previously extracted estimatesof the watermark sequence. This approach can be represented withequation (22) with μ=1. In this case, the interpolation function can bewritten asw _(1,b) [k]=w _(D,ηbk) [q _(bk) [k]],  (23)where q_(pk) ε{1, . . . . s_(f)L_(w)} and r_(pk) ε{1, . . . N_(b)} areas defined above.

A graphical interpretation of equation (23) is shown in FIGS. 13 a & b.FIG. 13 a shows how the different estimates of the correct watermarksequences (w_(I1), w_(I2), w_(I3), w_(I4)) are extracted from thebuffers for a time stretch, whilst FIG. 13 b shows similar informationfor a time shrink. As in FIG. 12, each row of boxes represents arespective buffer, with each box representing a location within eachbuffer. The arrows indicate the order in which the buffer contents arecollected from the estimates of the watermark sequences.

When the audio signal is time scale modified, the start and the end ofthe framing will gradually drift backward or forward, dependingrespectively upon whether the signal is time scale stretched orcompressed. The watermark symbol combining stage according to thisembodiment tracks the size of the drift. When the absolute value of thecumulative drift exceeds T_(s)/N_(b) (where N_(b) is the number ofbuffers i.e. the number of consecutive symbols that represent a singlewatermark symbol), then the symbol collection sequence from the buffersis adjusted to provide the next best estimate of the symbol from thebuffers. In other words, the buffer counters are incremented ordecremented (depending on drift direction), and a circular rotation ofthe buffer pointer for each watermark sequence estimation (w_(I1),w_(I2), w_(I3), w_(I4)) is performed.

Let k be the buffer entry counter, where k is an integer representingeach location within each buffer i.e. k=1 represents the first locationwithin each buffer, k=2 the second etc. If the estimates of thewatermark sequence are being taken from the buffers with no time scalemodification (as shown in FIG. 12), then it will be appreciated that thevalues in the first sequence can be represented by w_(I1)[k].

However, for time scaled estimates, assuming that an estimate η is beingmade of the time scale, then when${{{\eta\quad k}} \approx \frac{n}{N_{b}}},$where n is any integer (and in this example N_(b)=4), the counter valuesand the buffers from which the watermark estimates are taken arechanged.

If η is positive (time stretch), the counter for the first buffer isincremented. The ordering of the buffers is also circularly shifted(i.e. the watermark sequence estimate w_(I1) previously being taken frombuffer one will now been taken from buffer four, the estimate frombuffer two will now be taken from buffer one, the estimate from threewill now be taken from buffer two, and the estimate from buffer fourwill now be taken from buffer three). A similar circular shift is alsoperformed on the buffer counter k. This is shown diagrammatically inFIG. 13 a.

If η is negative (time stretch), the counter for the first buffer isincremented, and the ordering of the buffers is circularly shifted (i.e.the watermark sequence estimate w_(I1) previously being taken frombuffer one will now be taken from buffer two, the estimate from buffertwo will now be taken from buffer three, the estimate from three willnow be taken from buffer four, and the estimate from buffer four willnow betaken from buffer one). A similar circular shift is also performedon the buffer counter k. This is shown diagrammatically in FIG. 13 b.

After these circular shifts and adjustment to the buffer counters havebeen performed the symbol collection to form the different estimates ofthe watermark sequences continues from left to right until|ηk|≈(n+1)/N_(b) (i.e. the next interchange position is reached). Theprocess of buffer order interchanging and the sequential symbolcollection is then repeated until the end of the buffer is reached.

Consequently, it will be appreciated that a zeroth order interpolationof the time scaled watermark sequence has been performed. In otherwords, the time scaled watermark sequence has been estimated byselecting those values from the original, non time scaled watermarksequence estimates that would most closely correspond to the temporalpositions of the time scaled watermark sequence. By utilizing previouslyextracted estimates of the watermark sequence, such a techniqueefficiently resolves the problems of estimating correctly time scaledwatermarks, with minimal cost in terms of computational overhead.

Such estimates of the time scaled watermark sequence will then be passedto the correlator (410), so as to determine whether the predicted timeshift η accurately represents the time shift of the received signal i.e.do the estimates provided to the correlator provide good correlationpeaks. If not, then the time scale search will be repeated for adifferent estimated value i.e. a different value of η.

Due to possible time scale modification, the detection truth-value(whether or not the signal includes a watermark) is determined onlyafter the appropriate scale search has been conducted. Let Δη be thescale search step size and let us assume that we want the watermark tosurvive all the scale modifications in the interval [η_(min), η_(max)].The total number of visited scales is then given by $\begin{matrix}{N_{\eta} = \frac{\eta_{\max} - \eta_{\min}}{\Delta\quad\eta}} & (24)\end{matrix}$To minimize N_(η) it is preferred to find the maximum value of Δη thatcan still allow an exhaustive scale search. To this end, experimentalresults show that the detection performance is not significantlyaffected if the time scaling does not exceed half of the inverse of thebuffer length. This means that, for an exhaustive scale search, Δηshould be such that ${\Delta\quad\eta} \leq \frac{2}{N_{b}s_{f}L_{w}}$Putting this into equation (24), it follows that it is preferable toconduct a search over $\begin{matrix}{N_{\eta} = {\frac{N_{b}s_{f}L_{w}}{2}( {\eta_{\max} - \eta_{\min}} )}} & (25)\end{matrix}$time scales in order to conduct an exhaustive scale search. Clearly, anyscale search can be time consuming. Thus, the complexity issue and costin computing overhead should be considered when choosing the watermarkembedding parameters N_(b), s_(f) and L_(w).

In one preferred embodiment the scale search is adapted such thatinformation acquired during detection is utilized to plan an optimumsearch in the subsequent detection windows. For example, the scalesearch in the next detection window is started around the currentoptimum scale.

An alternative embodiment illustrated in FIG. 14 provides a method forefficient walk through the scale space by grid refinement. The moststraightforward solution is a linear search from the minimum scaletowards the maximum scale by adding up an incremental step. Assumingcorrelation, and thus confidence level, does not change abruptly fromone scale to the next, one can considerably reduce the amount of scalesvisited during the search by reducing the space granularity. As shown inFIG. 14, the algorithm starts at scale zero and is repeated until aminimum granularity is reached or the watermark is detected (i.e., alocal maximum for the confidence level is found) and/or the confidencelevel exceeds a predetermined threshold. When one has an indicationwhere to start the scale search (e.g. an initial estimation from aprevious detection), a random or linear search around this scale maysuffice.

As shown in FIG. 8, outputs (w_(D1), w_(D2), . . . w_(DNb)) from thebuffering stage are passed to the interpolation stage and, afterinterpolation, the outputs (w_(I1), w_(I2), . . . w_(INb)) of thisstage, which are needed to resolve a possible time scale modification inthe watermarked signal, are passed to the correlation and decisionstage. All of the estimates (w_(I1), w_(I2), . . . w_(INb)) of thewatermark corresponding to the different possible offset values arepassed to the correlation and decision stage 400.

The correlator 410 calculates the correlation of each estimate w_(Ij),j=1 . . . ,N_(b) with respect to the reference watermark sequencew_(c)[k]. Each respective correlation output corresponding to eachestimate is then applied to the maximum detection unit 420 whichdetermines which two estimates provided the maximum correlation peakvalues. These estimates are chosen as the ones that best fit thecircularly shifted versions w_(d1) and w_(d2) of the referencewatermark. The correlation values for these estimated sequences arepassed to the threshold detector and payload extractor unit 430.

The reference watermark sequence w_(s) used within the detectorcorresponds to (a possibly circularly shifted version of) the originalwatermark sequence applied to the host signal. For instance, if thewatermark signal was calculated using a random number generator withseed S within the embedder, then equally the detector can calculate thesame random number sequence using the same random number generationalgorithm and the same initial seed S so as to determine the watermarksignal. Alternatively, the watermark signal originally applied in theembedder and utilized by the detector as a reference could simply be anypredetermined sequence.

FIG. 15 shows a typical shape of a correlation function as output fromthe correlator 410. The horizontal scale shows the correlation delay (interms of the sequence samples). The vertical scale on the left hand side(referred to as the confidence level cL) represents the value of thecorrelation peak normalized with respect to the standard deviation ofthe normally distributed correlation function.

As can be seen, the typical correlation is relatively flat with respectto cL, and centered about cL=0. However, the function contains twopeaks, which are separated by pL (see equation 6) and extend upwards tocL values that are above the detection threshold when a watermark ispresent. When the correlation peaks are negative, the above statementapplies to their absolute values.

A horizontal line (shown in the Fig. as being set at cL=8.7) representsthe detection threshold. The detection threshold value controls thefalse alarm rate.

Two kinds of false alarms exist: The false positive rate, defined as theprobability of detecting a watermark in non watermarked items, and thefalse negative rate, which is defined as the probability of notdetecting a watermark in watermarked items. Generally, the requirementof the false positive alarm is more stringent than that of the falsenegative. The scale on the right hand side of FIG. 11 illustrates theprobability of a false positive alarm p. As can be seen in the exampleshown, the probability of a false positive p=10⁻¹² is equivalent to thethreshold cL=8.7, whilst p=10⁻⁸³ is equivalent to cL=20.

After each detection interval, the detector determines whether theoriginal watermark is present or whether it is not present, and on thisbasis outputs a “yes” or a “no” decision. If desired, to improve thisdecision making process, a number of detection windows may beconsidered. In such an instance, the false positive probability is acombination of the individual probabilities for each detection windowconsidered, dependent upon the desired criteria. For instance, it couldbe determined that if the correlation function has two peaks above athreshold of cL=7 on any two out of three detection intervals, then thewatermark is deemed to be present. Such detection criteria can bealtered depending upon the desired use of the watermark signal and totake into account factors such as the original quality of the hostsignal and how badly the signal is likely to be corrupted during normaltransmission.

The payload extractor unit 430 may subsequently be utilized to extractthe payload (e.g. information content) from the detected watermarksignal. Once the unit has estimated the two correlation peaks cL₁ andcL₂ that exceed the detection threshold, an estimate cL′ of the circularshift cL (defined in equation (6)) is derived as the distance betweenthe peaks. Next, the signs ρ₁ and ρ₂ of the correlation peaks aredetermined, and hence r_(sign) calculated from equation (7). The overallwatermark payload may then be calculated using equation (8).

For instance, it can be seen in FIG. 15 that pL is the relative distancebetween the two peaks. Both peaks are positive i.e. ρ₁=+1, and ρ₂=+1.From equation (7), r_(sign)=3. Consequently, the payload pL_(w)=<3, pL>.

It will be appreciated by the skilled person that variousimplementations not specifically described would be understood asfalling within the scope of the present invention. For instance, whilstonly the functionality of the detecting apparatus has been described, itwill be appreciated that the apparatus could be realized as a digitalcircuit, an analog circuit, a computer program, or a combinationthereof.

Equally, whilst the above embodiment has been described with referenceto an audio signal, it will be appreciated that the present inventioncan be applied to add information to other types of signal, for instanceinformation or multimedia signals, such as video and data signals.

Further, it will be appreciated that the invention can be applied towatermarking schemes containing only one watermarking sequence (i.e. a1-bit scheme), or to watermarking schemes containing multiplewatermarking sequences. Such multiple sequences can be simultaneously orsuccessively embedded within the host signal.

Within the specification it will be appreciated that the word“comprising” does not exclude other elements or steps, that “a” or “and”does not exclude a plurality, and that a single processor or other unitmay fulfil the functions of several means recited in the claims.

1. A method of compensating for a linear time scale change in a receivedsignal, the signal being modified by a sequence of symbols in the timedomain, the method comprising the steps of: (a) extracting an initialestimate of the sequence of symbols from said received signal; (b)forming an estimate of a correctly time scaled sequence of the symbolsby interpolating the values of said initial estimate.
 2. A method asclaimed in claim 1, wherein step (b) is repeated so as to provide arange of estimates corresponding to different time scalings.
 3. A methodas claimed in claim 1, wherein said interpolation is at least one ofzeroth order interpolation, linear interpolation, quadraticinterpolation and cubic interpolation.
 4. A method as claimed in claim1, the method further comprising the step of processing each estimate asthough it were the correctly time scaled sequence of the symbols, so asto determine which estimate is the best estimate.
 5. A method as claimedin claim 1, the method further comprising the steps of correlating eachof said estimates with a reference corresponding to said sequence ofsymbols; and taking the estimate with the maximum correlation peak asthe best estimate.
 6. A method as claimed in claim 1, wherein saidinitial estimate of the sequence of symbols is stored in a buffer.
 7. Amethod as claimed in claim 6, wherein said buffer is of total length M,the total number of scale searches conducted is$N_{\eta} = {\frac{M}{2}( {\eta_{\max} - \eta_{\min}} )}$where η_(min), η_(max) correspond respectively to the minimum andmaximum likely time scale modifications of the signal.
 8. A method asclaimed in claim 1, wherein said initial estimates of the sequence ofsymbols comprises a sequence of N_(b) estimates for each symbol, each ofthe N_(b) estimates corresponding to a different time offset of asymbol.
 9. A method as claimed in claim 1, wherein the scale search inthe next detection window is adapted based on the information acquiredduring the current detection window.
 10. A method as claimed in claim 1,wherein the scale space is searched using an optimal searchingalgorithm.
 11. A method as claimed in claim 10, wherein the searchingalgorithm is the grid refinement algorithm.
 12. A computer programarranged to perform the method as claimed in claim
 1. 13. A recordcarrier comprising the computer program as claimed in claim
 12. 14. Amethod of making available for downloading a computer program as claimedin claim
 12. 15. An apparatus arranged to compensate for a linear timescale change in a received signal, the signal being modified by asequence of symbols in the time domain, the apparatus comprising: anextractor arranged to extract an initial estimate of the sequence ofsymbols from said received signal; and an interpolator arranged to forman estimate of a correctly time scaled sequence of the symbols byinterpolating the values of said initial estimate.
 16. An apparatus asclaimed in claim 15, the apparatus further comprising a buffer arrangedto store one or more of said estimates.
 17. A decoder comprising theapparatus as claimed in claim 15.